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DETAILED ACTION 

Claim Rejections - 35 USC § 103 

1 The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this'title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

2. Claims 1 to 5, 21 to 25, 30 to 32, 37 to 39, 48 to 52, 68 to 73, 77 to 79, 84 to 86, 

90, 97, 99, and 101 are rejected under 35 U.S.C. 103(a) as being unpatentable over 

Chou et al. in view of Goldberg et al. ('158). 

Concerning independent claims 1 , 48, 97, 99, and 101 , Chou et al. discloses an 
apparatus, method, and instructions, comprising: 

"a receiver operable to receive an input signal" - an input of an unknown speech 
string 18 (an utterance) of words is received from a microphone (column 4, lines 34 to 
35: Figure 1); 

"a recognition processor operable to compare said input signal with stored label 
models to generate a recognized sequence of labels in said input signal and confidence 
data representative of the confidence that the recognized sequence of labels is 
representative of said input signal" - recognition processor 10 receives the input, 
accesses the recognition database 12, scores the unknown speech string of words 
against the recognition models in the recognition database 12, and generates a 
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hypothesis string signal 20; verification processor 16 receives the hypothesis string, and 
generates a confidence measure signal 22 (column 4, lines 34 to 51: Figure 1); 

"a similarity measure calculator operable to compare said recognized sequence 
of labels received from said recognition processor with a stored sequence of labels 
using a combination of i) predetermined confusion data which defines confusability 
between different labels, and ii) said confidence data received from the recognition 
processor and representative of the confidence that said received recognized sequence 
of labels is representative of the input signal, to provide a measure of the similarity 
between the recognized sequence of labels and the stored sequence of labels" - 
confidence score computation ("a similarity measure calculator") for a speech segment 
q relates a comparison between a word model score ("said confidence data") and 
scores computed with the anti-word model ("predetermined confusion data which 
defines confusability between different labels"); in Equation (2), L(O q ;©,l) is "the 
measure of similarity" calculated by the similarity measure calculator, gi(O q ) - log 
p(O q \ef k) ) is "the confidence data" for the keyword hypothesis {ei (k) }, and Gi(O q ) is the 
"predetermined confusion data which defines confusability" for anti-keywords {9 t (a) } 
which handle confusibility among keywords (column 8, lines 33 to 55: Figure 2). 

Concerning independent claims 1, 48, 97, 99, and 101, Chou etal. discloses 
scores computed with an anti-word model in Equation (2), where, arguably, Gi(O q ) is the 
"predetermined confusion data which defines confusability" for anti-keywords {9i (a) } 
which handle confusibility among keywords (column 8, lines 33 to 55: Figure 2). 
However, even assuming arguendo that "predetermined confusion data which defines 
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confusability between labels" is not clearly disclosed by Chou et al., it is still well known 
to account for confusability between similarly sounding letters in speech recognition. 
Specifically, Goldberg et al. ('158) teaches a statistical option generator for speech 
recognition, where confusion sets are defined for groups of letters that have a certain 
probability of being confused with one another as represented by confusion matrices. 
(Column 3, Line 55 to Column 4, Line 10) Letters "A", "J", and "K" are grouped together 
in Confusion Set 1, letters "B", "C", "D", "E", "P", "T", and "V" are grouped together in 
Confusion Set 2, etc. (Column 8, Lines 14 to 25: Figure 4) An objective is to reduce the 
time for matching an input identifier and conserving computing power by eliminating 
option identifiers from a candidate set of reference identifiers. (Column 3, Lines 37 to 
54) It would have been obvious to one having ordinary skill in the art to take into 
account confusion probabilities as taught by Goldberg et al. ('158) in a speech 
recognition system and method of Chou et al. for a purpose of reducing time for 
matching and conserving computing power. 

Concerning claims 2 and 49, Chou et al. discloses the confidence measure is 
generated based upon data stored in verification database 16 (column 4, lines 34 to 51: 
Figure 1). 

Concerning claims 3 and 50, Chou et al. discloses a word-based confidence 
score 34 (column 8, lines 40 to 55: Figure 2); each word is a "label" in a string of words 
being recognized. 
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Concerning claims 4 and 51 , Chou et al. discloses string models are generated in 
an "N" best list ("a list of alternatives") by N-best string model generator 46 (column 6, 
line 45 to column 7, line 53: Figure 2; column 9, line 54 to column 10, line 14). 

Concerning claims 5 and 52, Chou et al. discloses Viterbi alignment ("an aligner") 
of the input string, O, against the model sets for each given word string in the N-best 
string list (column 7, lines 8 to 15); average word-based confidence score processor 36 
("a combiner") performs mathematical averaging for each word segment signal of the 
hypothesis string to generate an average word-based confidence score signal ("said 
similarity measure") (column 5, lines 53 to 67: Figure 2). 

Concerning claims 21 and 68, Chou era/, discloses Viterbi alignment ("an 
aligner") of the input string, O, against the models sets for each given word string in the 
N-best string list (column 7, lines 8 to 15); Viterbi alignment is "a dynamic programming 
technique". 

Concerning claims 22 to 25, 69 to 72, and 90, Chou et al. discloses Viterbi 
alignment ("an aligner") of the input string, O, against the model sets for each given 
word string in the N-best string list (column 7, lines 8 to 15); implicitly, Viterbi alignment 
determines "progressively a plurality of possible alignments", generates scores for each 
given word in the N-best list, determines "an optimum alignment", and "combines the 
scores" for each word in the word string. 

Concerning claims 30 to 32 and 77 to 79, Chou etal. discloses the input string is 
speech (column 4, lines 34 to 35: Figure 1), which is a time sequential audio signal of 
words. 



Application/Control Number: 09/695,077 Page 6 

Art Unit: 2626 

Concerning claims 37, 38, 84, and 85, Chou et al. discloses confidence score 
computation for a speech segment q relates a comparison between a word model score 
("said confidence data") and scores computed with the anti-word model ("said confusion 
data"); in Equation (2), L(O q ;0,l) is "the measure of similarity" calculated by the similarity 
measure calculator, g,(O q ) = log p(O q \9f k) ) is "the confidence data" for the keyword 
hypothesis {6i (k) }, and G,(O q ) is "the confusion data" for anti-keywords {6i (a) } which 
handle confusibility among keywords (column 8, lines 33 to 55: Figure 2). 

Concerning claims 39 and 86, Chou etal. discloses an average confidence score 
based on upon the average of word-based confidence scores (column 5, lines 53 to 67); 
an average confidence score is a normalization from each of the word-based 
confidence scores. 

Concerning claim 73, Chou et al. discloses each of the words ("labels") in the 
unknown speech string ("each of the labels in said recognized sequence of labels") is 
scored against recognition models ("stored sequences of labels") in the recognition 
database 12 (column 4, lines 34 to 51: Figure 1). 

3. Claims 20, 35, 36, 46, 67, 82, 83, 93, 98, 100, and 102 are rejected under 35 
U.S.C. 103(a) as being unpatentable over Chou et al. in view of Goldberg etal. ('158) 
as applied to claims 1, 5, 48, 52, 97, 99, and 101 above, and further in view of Arefetal. 

Concerning claims 20 and 67, Chou et al. omits an aligner operable to identify 
deletions and insertions. However, Arefet al. teaches an analogous art speech 
recognition system for correcting misspelled words in a string of text. (Column 1 , Lines 
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32 to 52) Specifically, Arefet al. discloses detecting recognition errors as models from 
insertion errors and deletion errors. (Column 3, Lines 36 to 60) It is suggested that 
there are advantages to speed the search process and reduce the size of the database 
by correcting misrecognized or misspelled words with the search technique of Arefet al. 
(Column 1 , Lines 52 to 61) It would have been obvious to one having ordinary skill in 
the art to incorporate the insertion and deletion error technique of Arefet al. into the 
word-based confidence score method of Chou et al. for the purpose of correcting 
misrecognitions with a high speed search process and reduced database size. 

Concerning claims 35, 36, 82, and 83, Chou etal. omits mis-typing probabilities 
and mis-spelling probabilities based upon sub-word units. However, Arefetal. teaches 
an analogous art speech recognition system for correcting misspelled words in a string 
of text. (Column 1, Lines 32 to 52) Specifically, Arefetal. discloses probabilities for 
letters being recognized incorrectly, where letters are sub-word units, to estimate a 
measure of similarity between two words. (Column 4, Lines 1 to 59) Recognition errors 
are based upon typing errors, e.g. "airnmail" is mistakenly inserted for the word 
"airmail". (Column 3, Lines 36 to 53) It is suggested that there are advantages to 
speed the search process and reduce the size of the database by correcting 
misrecognized or misspelled words with the search technique of Arefetal. (Column 1, 
Lines 52 to 61 ) It would have been obvious to one having ordinary skill in the art to 
utilize the mis-typing and mis-spelling technique of sub-word units taught by Arefet al. 
into the word-based confidence score method of Chou et al. for the purpose of 
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correcting misrecognitions with a high speed search process and reduced database 
size. 

Concerning claims 46, 93, 98, 100, and 102, Chou era/, omits an application of 
speech recognition to querying a database and obtaining information from the database, 
although this is a well known application for speech recognition systems, generally. 
However, Arefetal. teaches an analogous art speech recognition system for searching 
a database for recognized text by querying keywords. (Column 2, Lines 41 to 50) It is 
suggested that there are advantages to speed the search process and reduce the size 

of the database by correcting misrecognized or misspelled words with the search 

i 

technique of Arefet al. (Column 1 , Lines 52 to 61) It would have been obvious to one 
having ordinary skill in the art to apply the word-based confidence score method of 
Chou et al. to a retrieval system from a database of automatically recognized text as 
taught by Arefet al. for the purpose of correcting misrecognitions with a high speed 
search process and reduced database size. 

4. Claims 33, 34, 80, and 81 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Chou et al. in view of Goldberg et al. ('158) as applied to claims 1 , 
31 , 32, 48, 79, and 80 above, and further in view of Wheatley et al. 

Chou et al. discloses recognizing speech with word-based confidence scores, 
where the labels are words, but omits recognizing sub-word units and phonemes. 
However, it is a well known art recognized alternative in speech recognition to recognize 
phonemes, which are sub-word units, rather than words. Wheatley et al. teaches a 
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related apparatus and method for speech recognition, where speech is recognized with 
Hidden Markov Models representing phonetic units instead of words. (Column 7, Lines 
14 to 37) It is suggested that there is an advantage of representing real world, 
unscripted conversations. (Column 2, Lines 28 to 39) It would have been obvious to 
utilize sub-word phonetic units for the speech recognition system of Chou et ai as 
suggested by Wheatley et ai for the purpose of better recognizing real world, unscripted 
conversations. 

5. Claims 40, 41, 87, and 88 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Chou et ai in view of Goldberg et ai ('158) as applied to claims 1 , 
37, 39, 48, 84, and 86 above, and further in view of Glenn et ai 

Concerning claims 40 and 87, Chou et ai discloses an average confidence score 
based on upon the average of word-based confidence scores (column 5, lines 53 to 67), 
where an average confidence score is a normalization from each of the word-based 
confidence scores, but does not expressly say that the normalization is obtained by 
dividing each similarity measure by a respective normalization score which varies in 
dependence upon the length of the corresponding stored sequence. However, Glenn et 
ai teaches a speech recognition system and method, where a pattern classifier 50 
matches bits between an event coder output and candidate reference patterns from a 
reference pattern memory 60. Counter 55 represents the number of bits that match 
between the event coder output and the reference pattern, where the count of matching 
bits is denoted by C*, and the maximum output from counter 55 is N*. which represents 
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a perfect match. Score computer 56 calculates a value S k = M(C k /N k ), which is the 
pattern classification score, and M is a constant equal to the maximum score possible. 
The ratio C^/N^ "normalizes" the scores obtained to account for the consistency of the 
training patterns. (Column 6, Lines 31 to 68: Figure 1) Value S k is a score, or "similarity 
measure", that is normalized "by dividing by a respective normalization score which 
varies in dependence upon the length of the correspond stored sequence" because 
represents the length of the compared patterns, as the maximum number of matching 
bits is the length of the corresponding patterns, and score, S k is obtained by dividing a 
count of matching bits, C ki by a maximum number of matching bits, or length, N k . Glenn 
et al. states that improved acoustic recognition is obtained by "normalizing" the scores 
to account for the consistency of the training patterns. (Column 6, Lines 66 to 68) It 
would have been obvious to one having ordinary skill in the art to normalize the score of 
similarity measures by dividing a similarity measure by a normalization score that varies 
in dependence upon the length of the stored sequence as taught by Glenn et al. in a 
speech recognition system and method of Chou et al. for a purpose of improving 
speech recognition by accounting for the consistency of training patterns. 

Concerning claims 41 and 88, Glenn etai discloses that the normalization 
scores, N k , are generated by a maximum number of matching bits between the event 
encoding output and the candidate reference patterns stored in a reference pattern 
memory 60. (Column 6, Lines 31 to 65: Figure 1) The candidate reference patterns 
stored in reference pattern memory 60 are "a stored sequence of annotation labels". 
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Implicitly, each reference pattern has its own characteristic length, so that, in general, a 
normalization score, N ki varies for each reference pattern. 

Allowable Subject Matter 

6. Claims 6 to 19, 26 to 29, 42 to 45, 47, 53 to 66, 74 to 76, 89, 91 to 92, and 94 to 
95 are objected to as being dependent upon a rejected base claim, but would be 
allowable if rewritten in independent form including all of the limitations of the base 
claim and any intervening claims. 

Response to Arguments 

7. Applicants' arguments filed 28 December 2005 have been considered but are 
moot in view of the new grounds of rejection. 

Conclusion 

8. The prior art made of record and not relied upon is considered pertinent to 
Applicants' disclosure. 

Goldberg ('261) and Brown et al. disclose, related art. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Martin Lerner whose telephone number is (571) 272- 
7608. The examiner can normally be reached on 8:30 AM to 6:00 PM Monday to 
Thursday. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David R. Hudspeth can be reached on (571) 272-7843. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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